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Abstract 

This paper deals with the stochastic control of nonlinear systems in 
the presence of state and control constraints, for uncertain discrete-time 
dynamics in finite dimensional spaces. In the deterministic case, the via- 
bility kernel is known to play a basic role for the analysis of such problems 
and the design of viable control feedbacks. In the present paper, we show 
how a stochastic viability kernel and viable feedbacks relying on proba- 
bility (or chance) constraints can be defined and computed by a dynamic 
programming equation. An example illustrates most of the assertions. 

Key words: stochastic control, state constraints, viability, discrete time, 
dynamic programming. 



1 Introduction 

Risk, vulnerability, safety or precaution constitute major issues in the man- 
agement and control of dynamical systems. Regarding these motivations, the 
role played by the acceptability constraints or targets is central, and it has to 
be articulated with uncertainty and, in particular, with stochasticity when a 
probability distribution is given. The present paper addresses the issue of state 
and control constraints in the stochastic context. For the sake of simplicity, 
we consider noisy control dynamics systems. This is a natural extension of de- 
terministic control systems, which covers a large class of situations. Thus we 
consider the following state equation as the uncertain dynamic model 

x(t + l) = f(t,x(t),u(t),w(t)), t = t ,...,T-l, with x(t )=x (1) 

where x(t) G X = 1™ represents the system state vector at time t, xq £ X is the 
initial condition at initial time to, u(t) 6U = 1 P represents decision or control 
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vector while w(t) G W = R q stands for the uncertain variable, or disturbance, 
or noise. 

The admissibility of decisions and states is first restricted by a non empty 
subset B(t, x) of admissible controls in U for all (t,x): 

u(t) e M(t,x(t)) CU. (2) 

Similarly, the relevant states of the system are limited by a non empty subset 
A(t,w(t)) of the state space X possibly uncertain for all t, 

x(t) G A(t,w(t)) G X , (3) 

and a target 

x(T) e A(T, w(T)) C X . (4) 

We assume that 

w(t) e S(t) c W , (5) 

so that the sequences 

w(-) := (w(t ), w(t + l),...,w(T- l),w(T)) (6) 

belonging to 

:= S(t ) x • ■ • x S(T) C W T+1 -*° (7) 

capture the idea of possible scenarios for the problem. A scenario is an uncer- 
tainty trajectory. 

These control, state or target constraints may reduce the relevant paths of 
the system (1). Such a feasibility issue can be addressed in a robust or stochastic 
framework. Here we focus on the stochastic case assuming that the domain of 
scenarios £1 is equipped with some probability P. In this probabilistic setting, 
one can relax the constraint requirements (2)-(3)-(4) by satisfying the state 
constraints along time with a given confidence level /3 

P(w(-) G ft | x(t) G A(t, w(t)) for t = t ,...,T) > (3 (8) 

by appropriate controls satisfying (2). Such probabilistic constraints are often 
called chance constraints in the stochastic literature as in [14, 16]. We shall 
give proper mathematical content to the above formula in the following section. 
Concentrating now on motivation, the idea of stochastic viability is basically to 
require the respect of the constraints at a given confidence level f3 (say 90%, 
99%). It implicitly assumes that some extreme events makes irrelevant the 
robust approach [12] that is closely related to stochasticity with a confidence 
level 100%. 

The problems of dynamic control under constraints usually refers to viability 
[1] or invariance [9, 17] framework. Basically, such an approach focuses on inter- 
temporal feasible paths. From the mathematical viewpoint, most of viability 
and weak invariance results are addressed in the continuous time case. However, 
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sonic mathematical works deal with the discrete-time case. This includes the 
study of numerical schemes for the approximation of the viability problems of 
the continuous dynamics as in [1, 15]. Important contributions for discrete-time 
case are also captured by the study of the positivity for linear systems as in [4] , 
or by the hybrid control as in [2, 17] or [11]. Other references may be found in 
the control theory literature, such as [5, 13] and the survey paper [6]. A large 
study focusing on the discrete-time case is also provided in [10]. 

Viability is defined as the ability to choose, at each time step, a control such 
that the system configuration remains admissible. The viability kernel associ- 
ated with the dynamics and the constraints play a major role regarding such 
issues. It is the set of initial states Xq from which starts an acceptable solution. 
For a decision maker or control designer, knowing the viability kernel has prac- 
tical interest since it describes the states from which controls can be found that 
maintain the system in a desirable configuration forever. However, computing 
this kernel is not an easy task in general. Of major interest is the fact that a 
dynamic programming equation underlies the computation or approximation of 
viability kernels as pointed out in [1, 10]. 

The present paper aims at expanding viability concepts and results in the 
stochastic case for discrete-time systems. In particular, we adapt the notions 
of viability kernel and viable controls in the probabilistic or chance constraint 
framework. Mathematical materials of stochastic viability can be found in [3, 8, 
7] but they rather focus on the continuous time case and cope with constraints 
satisfied almost surely. We here provide a dynamic programming and Bellman 
perspective for the probabilistic framework. 

The paper is organized as follows. Section 2 is devoted to the statement of 
the probabilistic viability problem. Then, Section 3 exhibits the dynamic pro- 
gramming structure underlying such stochastic viability. An example is exposed 
in Section 4 to illustrate some of the main findings. 

2 The stochastic viability problem 

Here we address the issue of state constraints in the probabilistic sense. This 
is basically related to risk assessment and management. This requires some 
specific tools inspired from the viability and invariance approach known for the 
certain case. In particular, within the probabilistic framework, we adapt the 
notions of viability kernel and viable controls. 

2.1 Probabilistic assumptions and expected value 

Probabilistic assumptions on the uncertainty w(-) 6 O are now added, providing 
a stochastic nature to the problem. Mathematically speaking, we suppose that 
the domain of scenarios fi C W T+1 = K 9 x • • • x is equipped with a u- 
field 1 T and a probability P: thus, (O, J 7 , P) constitutes a probability space. The 

1 For instance, T is the trace of Q on the usual borelian cr-field T = (S>tLt B(R q ). 
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sequences 

u;(.) = (io(o), w(l), . . . , w(T - 1), w(T)) G 

now become the primitive random variables. 

Hereafter, we shall assume that the random process w(-) is independent 
and identically distributed (i.i.d.) under probability P. In other words, we 
suppose that the probability is the product P = ®^L to /J of a common marginal 
distribution /x. The expectation operator E is defined on the set of measurable 
and integrable functions by 

E[g] = E P [g (w(-))\ = f g{w(t ), . . . , w(T))dfi(w(to)) ■ ■ ■ dfi(w(T)) , 
Jn 

and we have that 

E v [g(w(t))]=E l ,[g(w(t))} . 
2.2 Controls and feedback strategies 

It is well-known that control issues in the uncertain case are much more com- 
plicated than in the deterministic case. In the uncertain context, we must drop 
the idea that the knowledge of open-loop decisions u(-) = (u(to), • • • , u(T — 1)) 
induces one single path of sequential states x(-) = (x(t ), . ■ . , x(T)) . Open loop 
controls u(t) depending only upon time t are no longer relevant, contrarily to 
closed loop or feedback controls u(t,x(t)) which display more adaptive proper- 
ties by taking into account the uncertain state evolution x(t). In the stochastic 
setting, all the objects considered will be implicitly equipped with appropriate 
measurability properties. Thus we define a feedback as an element of the set of 
all measurable functions from the couples time-state towards the controls: 

it := {u : (t,x) G {t , ...,T-l}xX4 u(t,x) G U, u measurable} . (9) 

The control constraints case restricts feedbacks to admissible feedbacks account- 
ing for control constraints (2) as follows 

ii ad = {u G il | u(t, x) G M(t, x) , V(t, x) G {to, . . . , T — 1} x X} . (10) 

Let us mention that, in the stochastic context, a feedback decision is also 
termed a pure Markovian strategy. Markovian means that the current state 
contains all the sufficient information of past system evolution to determine the 
statistical distribution of future states. Thus, only current state x(t) is needed 
in the feedback loop among the whole sequence of past states x(to),. . . , x(t). 

At this stage, we need to introduce some notations which will appear quite 
useful in the sequel: the state map and the control map. Given a feedback 
u G il, a scenario w(-) £ and an initial state xq at time to £ {to, ■ ■ ■ , T— 1}, the 
solution state xj [to, x n , u, w(-)] is the state path x(-) = (x(to), x(t +l), . . . , x{T)) 
solution of dynamics 

x(t + 1) = f(t,x(t),u(t,x(t)),w(t)) , t = to,...,T-l 
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starting from the initial condition x(to) — xq at time to and associated with 
feedback control u and scenario w(-). The solution control Uf[t , x ,u,w(-)] 
is the associated decision path u(-) = (u(to),u(t + l),...,u(T — 1)) where 
u(t) = u(t,x(t)). 



2.3 The stochastic viability kernel and viable feedbacks 

The viability kernel plays a major role in the viability analysis. In the deter- 
minsitic case, it is the set of initial states xo such that the state constraints hold 
true for at least one control stategy. In the probabilistic setting, one relaxes 
the constraints requirement by satisfying the state constraints along time with 
a given confidence level as in (8). We give proper mathematical content to this 
latter formula (8) inspired by chance constraints [14] in the following Definition. 

Definition 1 The stochastic viability kernel at time to and at confidence level 
/3 G]0, 1] is 



Viab^(io) := <x e 



there exists u G il ad such that j 
G n | x{t) G A(t, w(t)) for t = t ,...,T) >P J 

(11) 

where x(t) is a shorthand for the solution map x(t) — Xf[to, Xo, u, w(-)](t). 

Stochastic viable feedbacks are measurable feedback controls that allow the 
stochastic viability property to hold true. 

Definition 2 Stochastic viable feedbacks are those for which the above relations 
occur: 

tt^"(to,xo) := {uGil ad f(w(-) G Q. | x(t) G A(t,w(t)) for t = t , . . . ,t) > (3 }. 

(12) 

We have the following strong link between stochastic viable feedbacks and 
the viability kernel: 

x G Viab^(to) ii}r b (to,x ) + . 

Of particular interest is the case where the confident rate is /? = 1 which is 
very close to robust viability and control. Indeed, when the scenario domain 
f2 is countable and that every scenario w(-) has strictly positive probability 
under P, Viabi(to) is the robust viability kernel (the set of initial states xo such 
that the state constraints hold true for at least one control stategy, whatever 
the scenario). When the uncertainty domain S(t) in (5) is reduced to a single 
element, so is also the scenario domain il in (7): this is the deterministic case 
for which Viabi(to) coincides with the classical viability kernel [1, 10]. 
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3 Stochastic dynamic programming equation 



We shall now exhibit a a characterization of stochastic viability in terms of 
dynamic programming. It relies on the the maximal viability probability defined 
recursively as follows. 

Definition 3 Assume that the random process w(-) is i.i.d. under probability 
P, with marginal distribution fj,. The stochastic viability value function V(t, x), 
associated with dynamics (1), control constraints (2), state constraints (3) and 
target constraints (4) is defined by the following backward induction: 



V(T,x) := E M 



V(t, x) := max E„ 

«eB(t,x) 



(13) 

\{t,w)^ V{t + l,f(t,x,u,w)) 



Here, 1a stands for the indicator function of a set A. It is defined by 1a{x) = 1 
if x E A, and 1a(x) = if x A. 

The backward dynamic programming equation (13) allows us to define the 
value function V(t, x). By writting a max instead of a sup, we implicitly as- 
sume the existence of an optimal solution for each time t and state x. It turns 
out that the stochastic viability function V(t ,x) at time to is related to the 
stochastic viability kernels {Viab^(io), E [0, 1]}, and that dynamic program- 
ming induction reveals relevant stochastic feedback controls. To achieve this, 
we first claim that the value function V is the solution of a (stochastic) optimal 
control problem involving the viability criterion tt defined as follows: 

T 

Trfo, *(•),«(•),«>(•)) = II 1 A( t Mt ) ) {x{t)) • (14) 

t=to ' 

Proposition 1 Assume that the random process w(-) is i.i.d. under probability 
P, with marginal distribution fi. For any initial conditions (t n ,x ), we have 

V(t ,x )= max E P [n(t ,x(-), u(-), «;(•))] , 

where the stochastic viability value function V(to,Xo) is given by the backward 
induction (13), where the criterion n is defined in (14), and where x(-) = 
Xf[to,Xo,u,w(-)](-) and u(-) = Uf[to,Xo,u, w(-)] are shorthand expressions for 
the solution maps. 

The proof of this previous Proposition is exposed in Appendix A. We also 
derive the following assertion regarding the stochastic viability kernel. 

Proposition 2 Assume that the random process w(-) is i.i.d. under probability 
P, with marginal distribution /i. The stochastic viability kernel at confidence 
level (3 is the section of level f3 of the stochastic value function: 

V(t Q ,x ) >[3<=^x Q E Viab^o) . (15) 
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The proof of this previous Proposition is also exposed in Appendix A. As 
regard the viable feedbacks, we obtain the following assertion. 

Proposition 3 Assume that the random process w(-) is i.i.d. under probability 
P, with marginal distribution fi. For any time t = to, ■ ■ ■ , T — 1 and state x, let 
us assume that 



is not empty. Then, for any Xo £ Viab^(io), any measurable selection 2 u* € B" mf> 
belongs to the set of stochastic viable feedbacks Slp ab (t , x ). 

4 A simple academic example 

To illustrate the general statements, we consider a simple academic model and 
perform a probabilistic viability analysis. 

4.1 Example statement 

The evolution of a scalar x(t) is governed by the discrete-time dynamics 

x(t + 1) = x(t) + u(t) + w(t) , 
where control is constrained by 

u(t) e {-1,1} = M(t,x) = B 
and uncertainty scenarii are induced by 

w(t) e {-1,0, l} = S(t) = 8. 
We assume that w(-) is an i.i.d. sequence, with probability 

fi(w{t) = 1) = fi(w{t) = -1) = P ; fi(w{t) = 0) = 1 - 2p . 
The state constraint is 



The decision maker intends to exhibit controls such that this constraint is sat- 
isfied with a high enough probability 




(16) 



x(t) e {-1, 0, 1} = A(t, w(t)) = A . 



F(x{t)e {-1,0,1}, t = t ,...,T >/3. 



2 Any u* € 11 such that u*(t, x) G B viab (t, x) for any t and x. 
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The intuition to satisfy the above probability constraint is as follows. When 
x(t) belongs to the border { — 1, 1} of the domain A = { — 1,0, 1}, there is an 
obvious decision to make: if x(t) = —1, take u(t) — 1 so that x{t) +u(t) = and 
thus x(t + 1) = w(t) e {-1,0, 1} (the same with x(t) = 1 and u(t) = -1 ). But 
when x{t) — 0, then x(t + 1) = u(t) + w(t) and, whatever u(t) 6 { — 1, 1}, there 
is a chance that w(t) takes the same value, sending x(t) outside A = { — 1, 0, 1}. 



4.2 Results 




Figure 1: 9 simulations of state trajectories x(t) over time horizon [0,40] for 
dynamics x(t + 1) = x(t) + u(t) + w(t) starting from xq = with stochastic 
viable feedback controls u*(t, x) 6 B viab (i, x) as defined in (17). Probability of 
facing high disturbances w G { — 1, +1} is low with p = 1%. Viability probability 
value function 1^(0,0) ~ 67% and 3 trajectories over 9 violate the constraint. 



By dynamic programming equation (13), we compute the maximal viability 
probability V(t, x) and associated viable feedback controls B viab (i, x). 

Result 1 Introduce matrix M , vectors 1 and li(x) by 











i 


V / 







M=| p l-2p |,1=| 1 | , = = l{*=i} 
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The stochastic viability value function is given by 

v(t,x)= Yl (i>),^ T_t i 

t=-l,0,l 

or, in other words, V(t, x) = for all x £ { — 1,0, 1} and 

V(t, x) = (M T -*l) x+2 , Vx e {-1, 0, 1} . 

The associated viable feedback controls are given by 

f 1 if x = -1 

W iab (t,x) = I {-1,1} if x = (17) 
1-1 if x = l, 

Consequently the viability kernel reads: 
Result 2 

f A if /3 < (M T -*1) 2 

Viab^t) = \ {-1, 1} if (M T '*1) 2 < /3 < (M T ~ t l)i 
[ i/ (M T - t l) 1 < (3. 

The difficulty of the control is captured by the second row of the matrix 
M where the sum is not equal to 1 which suggests that the state x = can 
escape from A. The results are illustrated by Figure 1 where 9 simulations 
of state trajectories x(t) starting from x — are displayed over time horizon 
[0, 40] with stochastic viable feedback controls u*(t, x) G B viab (t, x) as defined in 
(17). Probability of facing high disturbances w G {— 1, +1} is low with p = 1%. 
However viability probability value function turns out to be V(0, 0) w 67% 
which points out a significant risk of leaving viability sctA = { — 1,0,1} due the 
accumulation of risks over 40 periods; Therefore it is intuitive that 3 paths over 
9 leave the state constraint set A = { — 1,0, 1} along time. 



PROOF. We shall check that V(t,x) = X)»=-i o l y-i(x), M T ^ t lJ is solution 
to the dynamic programming equation (13). 
This is true for final time t = T because 

£ (l»,M T - T l)- ]T (li(x),l)= U(x) = l { -x,o,i } {x) = U(x). 

»=-l,0,l i=-l,0,l i=-l,0,l 

Proceeding by backward induction, let us suppose that 

V(t + l,x)= Y (u(x),M T -^i) . 

*=-l,0,l 



The dynamic programming equation (13) gives 



V(t, x) = l(_i i0 ,u(x) max E M 

uE{ — 1,1} 



V(t + l,x + u + w) 
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Whenever x £ A = { — 1,0,1}, we clearly have that V(t, x) = 0. Whenever 
x = —1, we deduce that 

V(t,-1) = max{pV(t + l,-3) + (l-2p)V(t + l,-2)+pV(t + l,-l), 
P V(t + 1, -1) + (1 - 2p)V(t + 1, 0) + P V(t + 1, 1)} 
= max{ P V(t + 1, -1) + (1 - 2p)V{t + 1,0)+ pV(t + 1, l),pV(t + 1, 
= P V(t + 1, -1) + (1 - 2p)V(t + 1, 0) + P V(t + 1, 1) 

and the viable control is provided by u*(t, —1) = 1. By induction, we deduce 
that 

V(t,-1) = pV(t + 1, -1) + (1 -2p)V(t + 1,0) +pV(t + 1,1) 

j=-l,0,l 

= {MM T - {t + 1 H) 1 

= (i:i(-i),m t -*i) 

*=-l,0,l 

In the same way, we check the expression for the stochastic viability value 
function V(t, 1) when x = 1, and obtain the viable control u*(i, 1) = —1. The 
case x — is treated in the same vein, with the difference that viable control is 
not unique since u*(i,0) G { — 1, +1} and 

V{t, 0) = pV(t + 1, -1) + (1 - 2p)V(t + 1, 0). 
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A Proofs 



A.l Proof of Proposition 1 

We use the following notations for any strategy u G IX: 

• 7r u is the evaluation of the criterion ir defined in (14) 

n u (t ,x ,w(-)) := n(t ,Xf[t ,Xo,u,w(-)](-),Uf[t ,x ,u,w(-)](-),w(-)) 

(18) 

where w(-) G £1 and Xf, Uf are the solution maps; 



• the expected value 

K(to,x ) ■= Ep [n u (t ,x ,w(-))] 
We consider the maximization problem: 



n^(to,x ) := max ir£(t ,x ). 
ueu ad 



(19) 



(20) 



We aim at proving that 



V(t,x) = 7Te (t,x) = max n^(t,x) . 

ueu ad 

Let u* G ii ad denote one of the measurable viable feedback strategies given by 
the dynamic programming equation (13). We perform a backward induction to 
prove (20). 

First, the equality at t = T holds true since 



n£(T,x) = E P 
= E„ 



n u (T,x,w(-)) 



[x) 



V(T,x) 



by definition (19) 

by definition (14) 
by definition (13). 



Now, suppose that 



?rg (t+l,x) = max 7rg(i + l,x) = V(t+l,x) . 
uea ad 



(21) 



The very definition (13) of the value function V by dynamic programming com- 
bined with (22) in Lemma 1 (proved below) imply that 



= E„ 



1 A(tMt)) {X) ^ (* + h / (*' X ' U * (< ' X) ' W{t)) ^ 
X K(t, w ) (x)V{t+l,f(t,x,u*(t,x),w)) 



= max u£B (t,x) e m 
= V(t,x) 



by (22) 

by (21) 

by (13) 
by (13). 
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Similarly, for any u € il ad , we obtain 



ir£(t,x) 



< E„ 



1 a(^) ^ V(t + lj(t,x,u(t,x),w)) 

1 A(t, w ) (x)V{t + l,f{t,x,u,w)) 



< max ueB (t,x) 
= V(t,x) 

Consequently, the desired statement is obtained since 



yields the equality 



max ir^(t,x) < V(t,x) ~ ir^ (i,x) 
ueii ad 



V(t, x) = 7Tg (t,x) = max ir^(t,x) . 

u£iX ad 



Lemma 1 We have, for t = to, . . . , T — 1 and u € il, 



7rg(T,x)= E M 



E„ 



^(t.^t))^) 

1 A(t, tu ) ^ ^ (* + 1. /(*> x > U (*, w )) 



by (22) 

by (21) 

since u(i, x) G 
by (13). 



l(t,x) 



(22) 



PROOF. By (14) and (18), we have 

«*<r, »,«*•)) = Ia^jw 

7r»(t, x, «,(•)) - 1 k [ tMt) ) (^(* + 1. /(*. "(*. «>(*)). ««(•))) • 

(23) 

Notice that 7r u (i, x, w{-)) depends only upon the end (w(t), . . . ,w(T — 1)), and 
not upon the beginning (w(£o), ■ • ■ , w(t — 1)). We shall write this property 
abusively by 



7T"(i, x, «;(•)) = tt u (t, x, (w(t), ...,w(T- 1))) . 



(24) 
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Wc have 



= Ep 
= Ep 

= E„ 



7T»(i,S, «,(•))] 

1 A(t M t)) (X) (* + X ' X ' U( *' *)> ">(*))> O) 

E„T- t -l[ 



by (19) 
by (23) 



L A(t,™(t)) (* + x ' "(*' «>(*)). «>(* + 1), • ■ • , ^ - 1))] 
by Fubini theorem 



E„ 



A(t,tu(t)) 



(x) 



E„t-«-i [tt u (i + 1, /(*, x, u(t, x), tu(i + 1), . . . , w(T - 1)))] 



= E p 
= E,, 



A(t,tu(t) 
A(t,tu) 



, (^^(.jen K (* + 1, ^ (*, u(t, x), »(•)))] 
(a:)7rS(t + l,/(t,x,u(t,a:),t«))] 



by (24) 
by (19). 



Proof of Proposition 2 

It is enough to remark that 



Viab^(t) 



x G 



maxE P [n(t ,x(-),u(-),w(-))] > (3 

«(•) 



(25) 



Proof of Proposition 3 

Simply follow step by step the proof of Proposition 1. 
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